UnSync: A Soft Error Resilient Redundant CMP Architecture

نویسندگان

  • Rida Bazzi
  • Georgios Fainekos
چکیده

i ABSTRACT Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the availability of increased hardware resources, redundancy based techniques are the most promising methods to eradicate soft error failures in CMP systems. This work proposes a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of UnSync architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability trade-offs in many-core systems. This work designs a detailed RTL model of UnSync architecture and performs hardware synthesis to compare the hardware (power/area) overheads incurred. It then compares the same with those of the Reunion technique, a state-of-the-art redundant multi-core architecture. This work also performs cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, UnSync architecture reduces power consumption by 34.5% and improves performance by up to 20% with 13.3% less area overhead, when compared to Reunion architecture for the same level of reliability achieved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Granularity of Soft-Error Containment in Shared Memory Multiprocessors

Enables flexibility in when to detect Case Study: HP NSAA HP’s NonStop Advanced Architecture (NSAA), although not a shared-memory multiprocessor, uses the memory containment granularity. Before performing disk or network I/O, NSAA compares redundant executions. Recovery is accomplished by reverting to a software-created backup process. Recovery Across I/O When coordinating checkpoints across al...

متن کامل

An error-resilient redundant subspace correction method

As we stride toward the exascale era, due to increasing complexity of supercomputers, hard and soft errors are causing more and more problems in high-performance scientific and engineering computation. In order to improve reliability (increase the mean time to failure) of computing systems, a lot of efforts have been devoted to developing techniques to forecast, prevent, and recover from errors...

متن کامل

Fingerprinting Across On-Chip Memory Interconnects

Pairs of cores in a chip multiprocessor (CMP) can execute programs redundantly to detect and recover from soft errors. Prior work assumes dedicated cross-core buses to compare the redundant cores’ outputs for error detection. In this paper, we investigate using the CMP’s existing on-chip memory interconnect for comparing hashes of architectural state updates, called fingerprints, across redunda...

متن کامل

Assessing SEU Vulnerability via Circuit-Level Timing Analysis

Recently, there has been a growing concern that, in relation to process technology scaling, the soft-error rate will become a major challenge in designing reliable systems. In this work, we introduce a high-fidelity, high-performance simulation infrastructure for quantifying the derating effects on soft-error rates while considering microarchitectural, timing and logic-related masking, using re...

متن کامل

Protecting Data Against Isolated Defects and Soft Errors Using Low delay Single Error Correction Codes

In this paper “Protecting data against isolated defects and soft errors using low delay single error correction codes” we proposed memories can be protected from soft errors by using error correction codes. Single error correction (sec) codes correct 1-bit error per word. In some condition, SEC codes are extended to detect double error are known as single error correction double error detection...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011